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Abstract. Bayesian methods are increasingly applied in these days in 
the theory and practice of statistics. Any Bayesian inference depends 
on a likelihood and a prior. Ideally one would like to elicit a prior from 
related sources of information or past data. However, in its absence, 
Bayesian methods need to rely on some "objective" or "default" priors, 
and the resulting posterior inference can still be quite valuable. 

Not surprisingly, over the years, the catalog of objective priors also 
has become prohibitively large, and one has to set some specific criteria 
for the selection of such priors. Our aim is to review some of these cri- 
teria, compare their performance, and illustrate them with some simple 
examples. While for very large sample sizes, it does not possibly mat- 
ter what objective prior one uses, the selection of such a prior does 
influence inference for small or moderate samples. For regular models 
where asymptotic normality holds, Jeffreys' general rule prior, the pos- 
itive square root of the determinant of the Fisher information matrix, 
enjoys many optimality properties in the absence of nuisance parame- 
ters. In the presence of nuisance parameters, however, there are many 
other priors which emerge as optimal depending on the criterion se- 
lected. One new feature in this article is that a prior different from 
Jeffreys' is shown to be optimal under the chi-square divergence cri- 
terion even in the absence of nuisance parameters. The latter is also 
invariant under one-to-one reparameterization. 

Key words and phrases: Asymptotic expansion, divergence criterion, 
first-order probability matching, Jeffreys' prior, left Haar priors, loca- 
tion family, location-scale family, multiparameter, orthogonality, ref- 
erence priors, right Haar priors, scale family, second-order probability 
matching, shrinkage argument. 
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Bayesian methods are increasingly used in recent 
years in the theory and practice of statistics. Their 
implementation requires specification of both a like- 
lihood and a prior. With enough historical data, it 
is possible to elicit a prior distribution fairly accu- 
rately. However, even in its absence, Bayesian meth- 
ods, if judiciously used, can produce meaningful in- 
ferences based on the so-called "objective" or "de- 
fault" priors. 

The main focus of this article is to introduce cer- 
tain objective priors which could be potentially use- 
ful even for frequentist inference. One such exam- 
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pie where frequentists are yet to reach a consensus 
about an "optimal" approach is the construction 
of confidence intervals for the ratio of two normal 
means, the celebrated Fieller-Creasy problem. It is 
shown in Section 4 of this paper how an "objec- 
tive" prior produces a credible interval in this case 
which meets the target coverage probability of a fre- 
quentist confidence interval even for small or mod- 
erate sample sizes. Another situation, which has of- 
ten become a real challenge for frequentists, is to 
find a suitable method for elimination of nuisance 
parameters when the dimension of the parameter 
grows in direct proportion to the sample size. This 
is what is usually referred to as the Neyman-Scott 
phenomenon. We will illustrate in Section 3 with an 
example of how an objective prior can sometimes 
overcome this problem. 

Before getting into the main theme of this paper, 
we recount briefly the early history of objective pri- 
ors. One of the earliest uses is usually attributed to 
Bayes (1763) and Laplace (1812) who recommended 
using a uniform prior for the binomial proportion p 
in the absence of any other information. While in- 
tuitively quite appealing, this prior has often been 
criticized due to its lack of invariance under one- 
to-one reparameterization. For example, a uniform 
prior for p in the binomial case does not result in 
a uniform prior for p 2 . A more compelling example 
is that a uniform prior for a, the population stan- 
dard deviation, does not result in a uniform prior 
for a 2 , and the converse is also true. In a situation 
like this, it is not at all clear whether there can be 
any preference to assign a uniform prior to either a 

2 

or a . 

In contrast, Jeffreys' (1961) general rule prior, na- 
mely, the positive square root of the determinant 
of the Fisher information matrix, is invariant under 
one-to-one reparameterization of parameters. We 
will motivate this prior from several asymptotic con- 
siderations. In particular, for regular models where 
asymptotic normality holds, Jeffreys' prior enjoys 
many optimality properties in the absence of nui- 
sance parameters. In the presence of nuisance pa- 
rameters, this prior suffers from many problems — 
marginalization paradox, the Neyman-Scott prob- 
lem, just to name a few. Indeed, for the location- 
scale models, Jeffreys himself recommended alter- 
nate priors. 

There are several criteria for the construction of 
objective priors. The present article primarily re- 
views two of these criteria in some detail, namely, 
"divergence priors" and "probability matching pri- 



ors," and finds optimal priors under these criteria. 
The class of divergence priors includes "reference 
priors" introduced by Bernardo (1979). The "prob- 
ablity matching priors" were introduced by Welch 
and Peers (1963). There are many generalizations of 
the same in the past two decades. The development 
of both these priors rely on asymptotic considera- 
tions. Somewhat more briefly, I have discussed also 
a few other priors including the "right" and "left" 
Haar priors. 

The paper does not claim the extensive thorough 
and comprehensive review of Kass and Wasserman 
(1996), nor does it aspire to the somewhat narrowly 
focused, but a very comprehensive review of proba- 
bility matching priors as given in Ghosh and Muk- 
erjee (1998), Datta and Mukerjee (2004) and Datta 
and Sweeting (2005). A very comprehensive review 
of reference priors is now available in Bernardo (2005), 
and a unified approach is given in the recent article 
of Berger, Bernardo and Sun (2009). 

While primarily a review, the present article has 
been able to unify as well as generalize some of the 
previously considered criteria, for example, viewing 
the reference priors as members of a bigger class of 
divergence priors. Interestingly, with some of these 
criteria as presented here, it is possible to construct 
some alternatives to Jeffreys' prior even in the ab- 
sence of nuisance parameters. 

The outline of the remaining sections is as fol- 
lows. In Section 2 we introduce two basic tools to 
be used repeatedly in the subsequent sections. One 
such tool involving asymptotic expansion of the pos- 
terior density is due to Johnson (1970), and Ghosh, 
Sinha and Joshi (1982), and is discussed quite ex- 
tensively in Ghosh, Delampady and Samanta (2006) 
and Datta and Mukerjee (2004). The second tool 
involves a shrinkage argument suggested by Dawid 
and used extensively by J. K. Ghosh and his co- 
authors. It is shown in Section 3 that this shrinkage 
argument can also be used in deriving priors with 
the criterion of maximizing the distance between the 
prior and the posterior. The distance measure used 
includes, but is not limited to, the Kullback-Leibler 
(K-L) distance considered in Bernardo (1979) for 
constructing two-group "reference priors." Also, in 
this section we have considered a new prior different 
from Jeffreys even in the one-parameter case which 
is also invariant under one-to-one reparameteriza- 
tion. Section 4 addresses construction of priors un- 
der probability matching criteria. Certain other pri- 
ors are introduced in Section 5, and it is pointed 
out that some of these priors can often provide ex- 
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act and not just asymptotic matching. Some final 
remarks are made in Section 6. 

Throughout this paper the results are presented 
more or less in a heuristic fashion, that is, with- 
out paying much attention to the regularity condi- 
tions needed to justify these results. More emphasis 
is placed on the application of these results in the 
construction of objective priors. 

2. TWO BASIC TOOLS 

An asymptotic expansion of the posterior den- 
sity began with Johnson (1970), followed up later 
by Ghosh, Sinha and Joshi (1982), and many oth- 
ers. The result goes beyond that of the theorem of 
Bernstein and Von Mises which provides asymptotic 
normality of the posterior density. Typically, such 
an expansion is centered around the MLE (and oc- 
casionally the posterior mode), and requires only 
derivatives of the log-likelihood with respect to the 
parameters, and evaluated at their MLE's. These 
expansions are available even for heavy-tailed den- 
sities such as Cauchy because finiteness of moments 
of the distribution is not needed. The result goes 
a long way in finding asymptotic expansion for the 
posterior moments of parameters of interest as well 
as in finding asymptotic posterior predictive distri- 
butions. 

The asymptotic expansion of the posterior resem- 
bles that of an Edgeworth expansion, but, unlike 
the latter, this approach does not need use of cumu- 
lants of the distribution. Finding cumulants, though 
conceptually easy, can become quite formidable, es- 
pecially in the presence of multiple parameters, de- 
manding evaluation of mixed cumulants. 

We have used this expansion as a first step in the 
derivation of objective priors under different crite- 
ria. Together with the shrinkage argument as men- 
tioned earlier in the Introduction, and to be dis- 
cussed later in this section, one can easily unify and 
extend many of the known results on prior selec- 
tion. In particular, we will see later in this section 
how some of the reference priors of Bernardo (1979) 
can be found via application of these two tools. The 
approach also leads to a somewhat surprising result 
involving asymptotic expansion of the distribution 
function of the MLE in a fairly general setup, and 
is not restricted to any particular family of distri- 
butions, for example, the exponential family, or the 
location-scale family. A detailed exposition is avail- 
able in Datta and Mukerjee (2004, pages 5-8). 

For simplicity of exposition, we consider primar- 
ily the one-parameter case. Results needed for the 



multiparameter case will occasionally be mentioned, 
and, in most cases, these are straightforward, albeit 
often cumbersome, extensions of one-parameter re- 
sults. Moreover, as stated in the Introduction, the 
results will be given without full rigor, that is, with- 
out any specific mention of the needed regularity 
conditions. 

We begin with X±, . . . ,X n \9 i.i.d. with common 
p.d.f. f(X\9). Let § n denote the MLE of 0. The like- 
lihood function is denoted by L n {9) = J}™ f(Xi\9) 
and let t n {9) = log L n {9). Let a { = n^tf l n {9) j 
d9 l ] e= Q , i = 1, 2, . . . , and let I n = —a2, the observed 
per unit Fisher information number. Consider a twice 

differentiable prior ir. Let T n = ^fn(9 — 9 n )I n , and 
let 7r*(t) denote the posterior p.d.f. of T n given 
Xi, . . . , X n . Then, under certain regularity condi- 
tions, we have the following result. 

Theorem 1. <(t) = 4>(t)[l + n _1 / 2 7i(t;X 1; . . . , 
X n ) +n- 1 72 (t; X 1 ,...,X n )]+ O p {n^' 2 ), where <f>{t) 
is the standard normal p.d.f., 71 (i; X±, . . . , X n ) = 

a 3 i 3 /(6/« /2 ) + (t/in /2 y(0 n )/ir(0 n ) and 

l2 {t-X x ,...,X n ) 

24/2 72/ 3 2I n ir(9 n ) 

+ —a t ^'^^ - — 
61 f 3 ir(9 n ) 8/2 

15o| 1 ir"(9 n ) q 3 n'{6 n ) 
~ 72/3 2I n ir(9 n ) 2/2 n (§ n ) ' 

The proof is given in Ghosh, Delampdy and Sa- 
manta (2006, pages 107-108). The statement involves 
a few minor typos which can be corrected easily. We 
outline here only a few key steps needed in the proof. 

We begin with the posterior p.d.f., 

7T(9\X 1 ,...,X n ) 

(2.1) 

= exp[4(0)M0)/ J exp[£ n (9)M9)de. 

~1 /2 

Substituting t = ^fn{9 — 9 n )I n , the posterior p.d.f. 
of T n is given by 

TT* n (t) = C' 1 exp[4{0n + t(n/„)- 1/2 } - e n (9 n )] 
(2.2) where C n = j exp[£ n {0 n + t{ni n )~ 1 ' 2 } 

•7r{4+i(n/n)- 1/2 }^. 
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The rest of the proof involves a Taylor expansion 
of exp[4{4 + t(n/ n )- 1 /2} and n{0 n + t{nl n )- 1 / 2 } 
around 9 n up to a desired order, and collecting the 
coefficients of n _1//2 , n _1 , etc. The other component 
is evaluation of C n via momets of the N(0, 1) distri- 
bution. 

Remark 1. The above result is useful in find- 
ing certain expansions for the posterior moments as 
well. In particular, noting 9 = 9 n + (n/ n ) _1//2 t n , it 
follows that the asymptotic expansion of the poste- 
rior mean of 9 is given by 



(2.3) 



E(6\Xi,. . . ,X V 



+ n 



-4- + t— 

2/2 



, +0 P (n~ 3/2 ). 

Also, V{0\X X , ...,X n ) = (ni n y l + O p (n" 3 / 2 ). 

A multiparameter extension of Theorem 1 is as 
follows. Suppose that 9 = (9i, . . . , 9 P ) T is the param- 
eter vector and 9 n is the MLE of 9. Let 



L njr 



n 



89 i dO r 



n 



dH n (9) 



89 j dO r 89 s 



and I n = ((I n j r )). Then retaining only up to the 
0{n~ 1 / 2 ) term, the posterior of W n = ^/n{9 — 9 n ) 
is given by 

tt*{w) = (27ry 1 / 2 exp[-(l/2)w T i n w} 



(2.4) 



1 + 

lj=i 



<91og7T 



BO, 



WjW r W s Clj rs 



3,r,s 



+ O p (n- 1 ) 



Next we present the basic shrinkage argument of 
J. K. Ghosh discussed in detail in Datta and Mukher- 
jee (2004). The prime objective here is evaluation 
of E[q(X,9)\9] = X(9), say, where X and 9 can be 
real- or vector- valued. The idea is to find first 
J \(9)it m (9) d9 through a sequence of priors {Tt m {9)} 
defined on a compact set, and then shrinking the 
prior to degeneracy at some interior point, say, 9 of 



the compact set. The interesting point is that one 
never needs explicit specification of Tt m {9) in carry- 
ing out this evaluation. We will see several illustra- 
tions of this in this article. 

First, we present the shrinkage argument in a nut- 
shell. Consider a proper prior 7f(-) with a compact 
rectangle as its support in the parameter space, and 
7f(-) vanishes on the boundary of support, while re- 
maining positive in the interior. The support of 7f (•) 
is the closure of the set. Consider the posterior of 9 
under 7f(-) and, hence, obtain E n [q(X, 9)\X]. Then 
find E[{E*(q(X,6)\X)}\6] = A(0) for 9 in the inte- 
rior of the support of 7f(-). Finally, integrate A(-) 
with respect to 7f(-), and then allow 7f(-) to con- 
verge to the degenerate prior at the true value of 9 at 
an interior point of the support of ir(9). This yields 
E[q(X, 9)\9}. The calculation assumes integrability 
of q(X,9) over the joint distribution of X and 9. 
Such integrability allows change in the order of in- 
tegration. 

When executed up to the desired order of approxi- 
mation, under suitable assumptions, these steps can 
lead to significant reduction in the algebra underly- 
ing higher order frequentist asymptotics. The sim- 
plification arises from two counts. First, although 
the Bayesian approach to frequentist asymptotics 
requires Edgeworth type assumptions, it avoids an 
explicit Edgeworth expansion involving calculation 
of approximate cumulants. Second, as we will see, it 
helps establish the results in an easily interpretable 
compact form. The following two sections will de- 
monstrate multiple usage of these two basic tools. 

3. OBJECTIVE PRIORS VIA MAXIMIZATION 
OF THE DISTANCE BETWEEN THE PRIOR 
AND THE POSTERIOR 

3.1 Reference Priors 

We begin with an alternate derivation of the ref- 
erence prior of Bernardo. Following Lindley (1956), 
Bernardo (1979) suggested a Kullback-Leibler (K-L) 
divergence between the prior and the posterior, na- 
mely, £7 [log ^TgjH ) where expectation is taken over 
the joint distribution of X and 9. The target is to 
find a prior tt which maximizes the above distance. 
It is shown in Berger and Bernardo (1989) that if 
one does this maximization for a fixed n, this may 
lead to a discrete prior with finitely many jumps, 
a far cry from a diffuse prior. Hence, one needs an 
asymptotic maximization. 
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First write i?[log 3g\ ) 



as 



E 



log- 



(3.1) 



n(9) 

log n (j*K (9\X)m*(X) d9dX 



log 



7r(9) 
7r(0) 



L n (9)n(9) dX d6 



tt(9)E 



log- 



7T 



I*) 



d0, 



7r(0) 

where X = (X 1 ,...,X n ), L n (6) = FIi/W), th e 
likelihood function, and m 7r (X) denotes the marginal 
of X after integrating out 0. The integrations are 
carried out with respect to a prior ir having a com- 
pact support, and subsequently passing on to the 
limit as and when necessary. 

Without any nuisance parameters, Bernardo 
(1979) showed somewhat heuristically that Jeffreys' 
prior achieves the necessary maximization. A more 
rigorous proof was supplied later by Clarke and Bar- 
ron (1990, 1994). We demonstrate heuristically how 
the shrinkage argument can also lead to the refer- 
ence priors derived in Bernardo (1979). To this end, 
we first consider the one-parameter case for a regu- 
lar family of distributions. We rewrite 

n(o\xy 



E 



(3.2) 



log- 



7r(0) 



7r(0)£[log7r(0|X)|0] dO 
- j 7r(0)log7r(0)d0. 



Next we write 



E*[\ogir{6\X)\X} = j \ogTT{9\X)Tt{6\X)de. 

From the asymptotic expansion of the posterior, one 
gets 

log7r(0|X) = (l/2)log(n) - (l/2)log(2vr) 



— n- 



(0 - k 



-J„ + (l/2)log(J„) 



+ O p (n- 1 ' 2 ). 



Since 



n(e-e n y 2 



I n converges a posteriori to a x\ dis- 
tribution as n — > oo, irrespective of a prior ir, by the 
Bernstein-Von Mises and Slutsky's theorems, one 
gets 

£;*[lo g 7r(0|X)] 
(3.3) = (1/2) log(n) - (1/2) log(27re) 

+ (l/2)log(/„)+O p (n- 1 /2 ) . 



Since the leading term in the right-hand side of (3.3) 
does not involve the prior tt, and I n converges al- 
most surely (Pg) to 1(6), applying the shrinkage ar- 
gument, one gets from (3.3) 

E[\ogw(e\X)\9] 

(3.4) = (1/2) log(n) - (1/2) log(27re) 

+ log(/ 1 / 2 (0)) + O(n- 1 / 2 ). 

In view of (3.2), considering only the leading terms 
in (3.4), one needs to find a prior tt which maximizes 

J log{ 1 w (g^ } 7r (^) dO. The integral being nonpositive 
due to the property of the Kullback-Leibler infor- 
mation number, its maximum value is zero, which 
is attained for ir(9) = I l l 2 (Q), leading once again to 
Jeffreys' prior. 

The multiparameter generalization of the above 
result without any nuisance parameters is based on 
the asymptotic expansion 

E[\og-K{9\X)\e\ 

= (p/2)log(7i)-(p/2)log(27re) 

+ J \og{\I{9)\ 1 l 2 /ir{9)}ir{9)d9 

+ 0(n~ 1 /2 )) 

and maximization of the leading term yields once 
again Jeffreys' general rule prior tt(9) = |/(0)| 1 / 2 . 

In the presence of nuisance parameters, however, 
Jeffreys' general rule prior is no longer the distance 
maximizer. We will demonstrate this in the case 
when the parameter vector is split into two groups, 
one group consisting of the parameters of interest, 
and the other involving the nuisance parameters. 
In particular, Bernardo's (1979) two-group reference 
prior will be included as a special case. 

To this end, suppose 9 = #2), where 9\ (pi x 1) 
is the parameter of interest and 62 (P2 x 1) is the 
nuisance parameter. We partition the Fisher infor- 
mation matrix 1(9) as 

r ( M-(hm h2(e)\ 

[) ~{hi(e) h2(e))- 

First begin with a general conditional prior 
7r(02|#i) = 4>(9) (say). Bernardo (1979) considered 
<f}(9) = |/22(0)| 1 ^ 2 - The marginal prior ir(9\) for 9\ is 
then obtained by maximizing the distance 
E'p.og !^}^\ ]■ We begin by writing 



, s , n(9 1 \X) , tt(9\X) 
(3.5) log _1 =log^ 1 



vr(0i) 



n(9) 



log 



7T(02\9l,X) 

tt(0 2 |0i) ' 
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Writing tt(0) = tt(0i)0(0) and |I(0)| = |I 22 (0)| ■ 
\I U .2(0)\, where 1^.2(0) = In(0)-I 12 (9)I^(6}I 21 (9), 
the asymptotic expansion and the shrinkage argu- 
ment together yield 



E 



log 



7T 



\x) 



tt(0) 
(p/2)log(n 



(p/2)log(27re) 



+ 



(3.6) 



7T 



, \In(0)\ 1/2 \Iii*(0)\ 1/2 d0 

log ^M^l 2>d 1 



+ 0(n 



and 



log 



7r(0 2 |0i,*) 



(3.7) 



7r(0 2 |0l) 

(p 2 /2)log(n)-(p 2 /2)log(27re) 

/ 22 (0)1 1/2 



log- 



d9 2 > dOi 



+ 0(n~ 1 ' 2 ). 

From (3.5)-(3.7), retaining only the leading term, 
n{9i\Xy 

(pi/2)log(n)-(pi/2)log(27re) 



E 



log- 



(3.8) 



+ J tt(0i) 



log d&2 



TTlt/1 



Writing logV>(0i) = / 0(0) log |/n. 2 (0)| 1/2 d0 2 , once 
again by property of the Kullback-Leibler informa- 
tion number, it follows that the maximizing prior 

7T(0i)=^(0i). 

We have purposely not set limits for these inte- 
grals. An important point to note [as pointed out in 
Berger and Bernardo (1989)] is that evaluation of all 
these integrals is carried out over an increasing se- 
quence of compact sets Ki whose union is the entire 
parameter space. This is because most often we are 
working with improper priors, and direct evaluation 
of these integrals over the entire parameter space 



will simply give +oo which does not help finding 
any prior. As an illustration, if the parameter space 
is 1Z x 1Z + as is typically the case for location-scale 
family of distributions, then one can take the in- 
creasing sequence of compact sets as [— i, i] x 
i>2. All the proofs are usually carried out by tak- 
ing a sequence of priors tt{ with compact support Ki, 
and eventually making i — > oo. This important point 
should be borne in mind in the actual derivation 
of reference priors. We will now illustrate this for 
the location-scale family of distributions when one 
of the two parameters is the parameter of interest, 
while the other one is the nuisance parameter. 

Example 1 (Location-scale models). Suppose 
X\,... ,X n are i.i.d. with common p.d.f. a^ 1 f((x — 
fJ>)/a), where n 6 (—00,00) and a G (0,oo). Con- 
sider the sequence of priors 7Tj with support [— x 
[i~ i = 2,3, — We may note that I(fi,a) = 
CJ_2 (c2 C3)' wriere th e constants c\, c 2 and C3 are 
functions of / and do not involve either /i or a. So, 
if fx is the parameter of interest, and a is the nui- 
sance parameter, following Bernardo's (1979) pre- 
scription, one begins with the sequence of priors 
Ki2(cr\ij) = k i2 a~ l where, solving 1 = k i2 £-1 cr" 1 da, 
one gets ki 2 = (21ogi)~ 1 . Next one finds the prior 
^ii(l^) = kn exp[ J 1 - ki 2 a~ 1 log(a~ 1 ) da] which is a con- 
stant not depending on either \i or a. Hence, the re- 
sulting joint prior TTi(fi,a) = 7Tn(fi)Tri 2 (a\[i) oc cr -1 , 
which is the desired reference prior. Incidentally, 
this is Jeffreys' independence prior rather than Jef- 
freys' general rule prior, the latter being propor- 
tional to c~ 2 . Conversely, when a is the parame- 
ter of interest and \x is the nuisance parameter, one 
begins with 7Tj 2 (/i|cr) = (2*)" 1 and then, following 
Bernardo (1979) again, one finds irn(a) = 
en expfj"^! (2i) _1 log(l/<r)] d/i] oc a . Thus, once 
again one gets Jeffreys' independence prior. We will 
see in Section 5 that Jeffreys' independence prior is 
a right Haar prior, while Jeffreys' general rule prior 
is a left Haar prior for the location-scale family of 
distributions. 

Example 2 (Noncentrality parameter). Let X±, 
. . . , X n \fi, a be i.i.d. N(/x, a 2 ), where /i real and cr(>0) 
are both unknown. Suppose the parameter of inter- 
est is 9 = fi/a, the noncentrality parameter. With 
the reparameterization (9, a) from (fi, a), the likeli- 
hood is rewritten as L(9,a) oc <7 -n exp[— " 
Y17=i (Xi — 9a) 2 ] . Then the per observation Fisher in- 



formation matrix is given by 1(9, a) - 



( 1 0/a 

\9/a (6 2 +2)/ 
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Consider once again the sequence of priors tt{ with 
support x [i~ , i], i = 2,3, . . . . Again, following 

Bernardo, ~Ki2(<r\9) = ki2<J~ l , where fcj2 = (21ogi) _1 . 
Noting that In. 2 (e,a) = l-6 2 /(8 2 + 2) = 2/(9 2 + 2), 

one gets m(0)=hiexp\J-i log(v / 2/(0 2 + 2) 1 / 2 ) da]oc 
(9 2 + 2) -1 / 2 . Hence, the reference prior in this ex- 
ample is given by 7Tr(0,o") oc (6 2 + 2) _1 ' 2 o" _1 . Due to 
its invariance property (Datta and Ghosh, 1996), in 
the original (/i, a) parameterization, the two-group 
reference prior turns out to be 7Tr(/x, a) oc <t -1 (// 2 + 
2a 2 )- 1 / 2 . 

Things simplify considerably if 9\ and 62 are or- 
thgonal in the Fisherian sense, namely, I\2(9) = 
(Huzurbazaar, 1950; Cox and Reid, 1987). Then if 
I\i(9) and 122(G) factor respectively as /in(#i)/ii2(#2) 
and /121 (0i)ti22 (#2), as a special case of a more gen- 
eral result of Datta and Ghosh (1995c), it follows 
that the two-group reference prior is given by 

h 1 ^{e 1 )h 1 i{e 2 ). 

Example 3. As an illustration of the above, con- 
sider the celebrated Neyman-Scott problem (Berger 
and Bernardo, 1992a, 1992b). Consider a fixed ef- 
fects one-way balanced normal ANOVA model where 
the number of observations per cell is fixed, but the 
number of cells grows to infinity. In symbols, let 
Xn, . . . , Xik\0i be mutually independent N(#j,<r 2 ), 
k > 2, i = 1, . . . , n, all parameters being assumed un- 
known. Let S = £™ =1 £i=i(*ij - Xi) 2 /(n(k - 1)). 
Then the MLE of a 2 is given by (k — l)S/k which 
converges in probability [as n — > 00 to (k — l)a 2 /k], 
and hence is inconsistent. Interestingly, Jeffreys' 
prior in this case also produces an inconsistent es- 
timator of a 2 , but the Berger-Bernardo reference 
prior does not. 

To see this, we begin with Fisher Information ma- 
trix I(6»i, ... , 6 n ,cr) = kDiag(cr- 2 , ...,a~ 2 , (l/2)na" 4 ). 
Hence, Jeffreys' prior ir j{0\, . . . , 9 n , a 2 ) oc (cr 2 ) _n / 2_1 
which leads to the marginal posterior TTj(a 2 \X) oc 
(a 2 )-^/ 2 - 1 exp[-n(k - l)S/(2a 2 )] of a 2 , X denot- 
ing the entire data set. Then the posterior mean 
of a 2 is given by n(k — l)S/(nk — 2), while the pos- 
terior mode is given by n(k — l)S/(nk + 2). Both 
are inconsistent estimators of a 2 , as these converge 
in probability to (k — l)a 2 /k as n — > 00. 

In contrast, by the result of Datta and Ghosh 
(1995c), the two-group reference prior ttr(6i, . . . , 9 n , 
a 2 ) oc (cr 2 ) -1 . This leads to the marginal posterior 
Tr R (cr 2 \X) oc (cr 2 )-"( fe - 1 )/ 2 - 1 exp[-n(A : - l)S/{2a 2 )} 
of a 2 . Now the posterior mean is given by n(k — l)S/ 
(n{k — 1) — 2), while the posterior mode is given by 



n(k — l)S/(n(k — 1) + 2). Both are consistent esti- 
mators of cr 2 . 

Example 4 (Ratio of normal means). Let X\ 
and X 2 be two independent N(0//,//) random vari- 
ables, where the parameter of interest is 6. This is 
the celebrated Fieller-Creasy problem. The Fisher 
information matrix in this case is 1(8, /x) = ( ^ ) ■ 

With the transformation cp = fi(l + 6 2 ) 1 / 2 , one ob- 
tains 1(9, (p) = Diag(0 2 (l + <9 2 )- 2 ,l). Again, by Dat- 
ta and Ghosh (1995c), the two-group reference prior 
ir R (9,<J ) )oc(l + 9 2 )- 1 . 

Example 5 (Random effects model). This ex- 
ample has been visited and revisited on several oc- 
casions. Berger and Bernardo (1992b) first found ref- 
erence priors for variance components in this prob- 
lem when the number of observations per cell is the 
same. Later, Ye (1994) and Datta and Ghosh (1995c, 
1995d) also found reference priors for this problem. 
The case involving unequal number of observations 
per cell was considered by Chaloner (1987) and Dat- 
ta, Ghosh and Kim (2002). 

For simplicity, we consider here only the case with 
equal number of observations per cell. Let = 
m + ai + eij , j = 1, . . . , n,i = 1, . . . , k. Here m is an 
unknown parameter, while a^'s and are mutu- 
ally independent with a^'s i.i.d. N(0,cr 2 ) and 
i.i.d. N(0,<7 2 ). The parameters m, <r 2 and a 2 are 
all unknown. We write Yi = X2?=i -^j'/ 71 ; i = 1, ■ ■ ■ ,k, 
and Y = ^Zi = \Yi/k. The minimal sufficient statistic 
is (Y,T,S), where T = - Y) 2 and S = 

EtiE^iC^-^) 2 - 

The different parameters of interest that we con- 
sider are m, o\jo 2 and a 2 . The common mean m 
is of great relevance in meta analysis (cf. Morris 
and Normand, 1992). Ye (1994) pointed out that 
the variance ratio cr^/a 2 is of considerable inter- 
est in genetic studies. The parameter is also of im- 
portance to animal breeders, psychologists and oth- 
ers. Datta and Ghosh (1995d) have discussed the 
importance of a 2 , the error variance. In order to 
find reference priors for each one of these param- 
eters, we first make the one-to-one transformation 
from (m, <r 2 , a 2 ) to (m, r, u), where r = a~ 2 and u = 
cr 2 /(n<7 2 +<7 2 ). Thus, a 2 /a 2 = (1 - u)/(nu), and the 
likelihood L(m,r,u) can be expressed as 

L(m, r, u) 

= r nk / 2 u k / 2 exp[-(r/2){nku(Y - m) 2 + uT + S}}. 
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Then the Fisher information matrix simplifies to 
I(m, r,u) = k Diag(raru, n/(2r 2 ), l/(2u 2 ). From The- 
orem 1 of Datta and Ghosh (1995c), it follows now 
that when m, r and u are the respective param- 
eters of interest, while the other two are nuisance 
parameters, the reference priors are given respec- 
tively by 7Tifl(m,r, u) = 1, iT2R(m,r,u) = r _1 and 
ir 3R (m,r,u) = u _1 . 

3.2 General Divergence Priors 

Next, back to the one-parameter case, we consider 
the more general distance (Amari, 1982; Cressie and 
Read, 1984) 



(3.9) 



7r(0) 



which is to be interpreted as its limit when (3 — > 0. 
This limit is the K-L distance as considered in Ber- 
nardo (1979). Also, (3 = 1/2 gives the Bhattacharyya- 
Hellinger (Bhattacharyya, 1943; Hellinger, 1909) dis- 
tance, and (3 = — 1 leads to the chi-square distance 
(Clarke and Sun, 1997, 1999). In order to maximi- 
ze with respect to a prior w, one re-expresses (3.9) 
as 



IT 



1 



TT P+l (e)^- p (9\X)L n {e)dXd9 



I TT p+x {e)E[{K- p {e\x)}\e\de 



(3.10) 



/{/3(1 -/?)}• 

Hence, from (3.10), maximization of D v amounts to 
minimization (maximization) of 



(3.11) 



Tr p+1 (e)E[{ir-P(e\X)}\6] d9 



for < (3 < 1 {(3 < 0). First consider the case < 
< 1. From Theorem 1, the posterior of 9 is 



(3.12) 
Thus, 



■[l + O^n- 1 ' 2 



ir~P(e\X) 
(3.13) =n-^ 2 (2nf/ 2 i-^ 2 
n P fa a \2 

exp 



[l + O^n- 1 ' 2 )]. 



Following the shrinkage argument, and noting that 

conditional on 9, I n -^I(9), while n(9 — 9 n ) 2 I n —>Xii 
it follows heuristically from (3.13) 

E[n-P(e\X)] 
(3.14) = n^/ 2 (27r)^ 2 [I(9)]^/ 2 (l - /3)-V2 

■ [i + q p (n- 1 ' 2 )]. 

Hence, from (3.14), considering only the leading term, 
for 0</3<l, minimization of (3.11) with respect to tt 
amounts to minimization of J '[n (9) /I 1 ' 12 (#)]^7r (6) d9 
with respect to ir subject to J ir(9)d9 = 1. A sim- 
ple application of Holder's inequality shows that this 
minimization takes place when tt(9) oc I 1 / 2 (9). Simi- 
larly, for -1 < /3 < 0, 7r(0) oc I 1 / 2 (9) provides the de- 
sired maximization of the expected distance between 
the prior and the posterior. The K-L distance, that 
is, when j3 — >■ 0, has already been considered earlier. 

Remark 2. Equation (3.14) also holds for (3 < 
— 1. However, in this case, it is shown in Ghosh, 
Mergel and Liu (2011) that the integral f{ir(0)/ 
I 1 ^ 2 (9)}~^ ■ w(9)d9 is uniquely minimized with re- 
spect to tt(9) oc I 1 / 2 (9), and there exists no maximi- 
zer of this integral when J tt(9) d9 = 1. Thus, in this 
case, there does not exist any prior which maximizes 
the posterior-prior distance. 

Remark 3. Surprisingly, Jeffreys' prior is not 
necessarily the solution when (3 = — 1 (the chi-square 
divergence). In this case, the first-order asymptotics 
does not work since tt^ +1 (9) = 1 for all 9. However, 
retaining also the O p (n~ l ) term as given in Theo- 
rem 1, Ghosh, Mergel and Liu (2011) have found in 
this case the solution tt{9) oc exp [J 6 * 233 ^jm ^ dt], 



where g 3 (t) = E[~ ^ m \t]. We shall refer to 



dt? 1 

this prior as vtgml(^)- We will show by examples 
that this prior may differ from Jeffreys'prior. But 
first we will establish a hitherto unknown invariance 
property of this prior under one-to-one reparameter- 
ization. 

Theorem 2. Suppose that <j> is a one-to-one twi- 
ce differentiable function of 9. Then vtgml(0) = 
Cttgml(0)|^t|> where C(> 0), the constant of pro- 
portionality, does not involve any parameters. 

Proof. Without loss of generality, assume that 4> 
is a nondecreasing function of 9. By the identity 

g 3 ((p) = I {<t>) + E 
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(3.15) 



^gml^Agml^) reduces to 
*gml(0)/tgml(^) 

T + 2E[(d 2 'log //d<A 2 )(dlog//#)] 
4/(0) 

Next, from the relation I ((f)) = I(9)(d9 /d(j)) 2 , one 
gets the identities 



(3.16) 



+ 2/(0)(d0/#)(d 2 0/# 2 ); 



d 2 logf \ ( dlogf 



(3.17) 



d(p 

d 2 \ogf (d9\ 2 dlogf #B\ 
d9 2 \d(f>) dO ' d4> 2 ] 

dlogf d9\ 



dO 



From (3.15)-(3.17), one gets, after simplification, 

^GMl(0)AgMl(<£) 



tt; 



ajfi) d9 | d 2 6/dcp 2 



(3 ' 18) vr GM L(^) # ' dO/dcj) • 

Now, on integration, it follows from (3.18) 7Tqml (</>) = 
C '7r 'gml(^) ^(dO I 'd<j)) , which proves the theorem. □ 

Example 6. Consider the one-parameter expo- 
nential family of distributions with p(X\9) = 
exp[9X - if>(6) + h(X)}. Then g 3 (9) = I' (9) so that 

tt(0) oc exp[i J I J ^d9] = I l/4 (9), which is different 

from Jeffreys' I 1 / 2 (9) prior. Because of the invari- 
ance result proved in Theorem 2, in particular, for 
the Binomial (n,p) problem, noting that p = exp(9) / 

[1 + exp(0)], one gets 7t G ml(?>) oc p~ 3/A (l - p)~ 3/A , 
which is a Beta(|,|) prior, different from Jeffreys' 
Beta(|,^) prior, Laplace's Beta(l,l) prior or Hal- 
dane's improper Beta(0, 0) prior. Similarly, for the 
Poisson (A) case, one gets 7tgml(A) oc A" 1 / 4 , again 
different from Jeffreys' vrj(A) oc A -1 / 2 prior. How- 
ever, for the N(0, 1) distribution, since 1(9) = 1 and 
<73(6>) =/'(#) =0,7tgml(#)=c(> 0), a constant, which 
is the same as Jeffreys' prior. It may be pointed out 
also that for the one-parameter exponential family, 
for the chi-square divergence, vtgml differs from Har- 
tigan's (1998) maximum likelihood prior tth(9) = 
1(9). 

Example 7. For the one-parameter location fa- 
mily of distributions with p(X\9) = f(X — 9), where / 
is a p.d.f., both g 3 (9) and 1(9) are constants imply- 



ing I'(9) = 0. Hence, 7Tgml(#) is of the form 
ttgml(^) = exp(k9) for some constant k. However, 
for the special case of a symmetric /, that is, /(-X") = 
f(-X) for all X, g 3 (9) = 0, and then ir GML (0) re- 
duces once again to tt(9) = c, which is the same as 
Jeffreys' prior. 

Example 8. For the general scale family of dis- 
tributions with p(X\9) = 9~ 1 f(f),9 > 0, where / is 
a p.d.f., 1(9) = % for some constant c\(> 0), where 



93(9) 



'-2 



for some constant ci- Then 7Tgml(#) oc 



exp(clog#) = 9 C for some constant c. In particu- 
lar, when p(X\9) = 0" 1 exp(-f ),_7t G ml(0) « #~ 3/2 , 
different from Jeffreys' irj(9) cc9~ l for the general 
scale family of distributions. 

The multiparameter extension of the general di- 
vergence prior has been explored in the Ph.D. dis- 
sertation of Liu (2009). Among other things, he has 
shown that in the absence of any nuisance param- 
eters, for |/3 1 < 1, the divergence prior is Jeffreys' 
prior. However, on the boundary, namely, f3 = — 1, 
priors other than Jeffreys' prior emerge. 

4. PROBABILITY MATCHING PRIORS 
4.1 Motivation and First-Order Matching 

As mentioned in the Introduction, probability ma- 
tching priors are intended to achieve Bayes-frequen- 
tist synthesis. Specifically, these priors are required 
to provide asymptotically the same coverage proba- 
bility of the Bayesian credible intervals with the cor- 
responding frequentist counterparts. Over the years, 
there have been several versions of such priors-quan- 
tile matching priors, matching priors for distribu- 
tion functions, HPD matching priors and match- 
ing priors associated with likelihood ratio statistics. 
Datta and Mukerjee provided a detailed account of 
all these priors. In this article I will be concerned 
only with quantile matching priors. 

A general definition of quantile matching priors is 
as follows: Suppose X%, . . . , X n \9 i.i.d. with common 
p.d.f. f(X\9), where 9 is a real-valued parameter. 
Assume all the needed regularity conditions for the 
asymptotic expansion of the posterior around 9 n , 
the MLE of 9. We continue with the notation of 
the previous section. For < a < 1, let 9f_ a (Xi, . . . , 
X n ) = 9\_ OL denote the (1 — a)th asymptotic poste- 
rior quantile of 9 based on the prior tt, that is, 



(4.1) 



P n [9 < 9±_ a \Xi,. ..,X n 
= l-a + O p (n- r ) 



10 



M. GHOSH 



for some r>0. If now P [9 < 9\_ a \9] = 1 -a + O p (n- r ), 
then some order of probability matching is achieved. 
If r = 1 , we call 7r a first-order probability matching 
prior. If r = 3/2, we call tt a second-order probability 
matching prior. 

We first provide an intuitive argument for why Jef- 
freys' prior is a first-order probability matching prior 
in the absence of nuisance parameters. If X±,..., 
X n \9 i.i.d. N(0, 1) and vr(<9) = 1, -co < 9 < oo, then 
the posterior n(6\Xi, . . . ,X n ) is N(X n ,n ). Now 
writing the 100(1 — a)% quantile of the 

N(0, 1) distribution, one gets 



(4.2) 



P[V^(9-X n )<z 1 _ a \X 1 ,...,X n ) 
= l-a = P[V^(X n -9)> - Zl - a \0], 



so that the one-sided credible interval X n + z\- a ] ^fn 
for 9 has exact frequentist coverage probability 1 — a. 

The above exact matching does not always hold. 
However, if Xi, . . . , X n \9 are i.i.d., then 9 n \9 is asym- 
ptotically N(9,(nl(9))- 1 ). Then, by the delta me- 
thod, g0 n )\9^[g(9),(g'(9)) 2 (nl(9))- 1 ]. So if g'(6) = 
/V2(0) so that g(9) =f lW(t)dt, Vn\g0 n )-g(9)]\e 
is asymptotically N(0, 1). Hence, from (4.2), with 
the uniform prior ir((f)) = 1 for eft = g(0), coverage 
matching is asymptotically achieved for <j). This leads 
to the prior tt(0) = % = g'(9) = I 1 ' 2 (9) for 9. 

Datta and Mukerjee (2004, pages 14-21) proved 
the result in a formal manner. They used the two 
basic tools of Section 3. In the absence of nuisance 
parameters, they showed that a first-order matching 
prior for 9 is a solution of the differential equation 



(4.3) 



o. 



so that Jeffreys' prior is the unique first-order match- 
ing prior. However, it does not always satisfy the 
second-order matching property. 

4.2 Second-Order Matching 

In order that the matching is accomplished up 
to (9(n~ 3 / 2 ) (second-order matching), one needs an 
asymptotic expansion of the posterior distribution 
function up to the 0(n _1 ) term, and to set up a sec- 
ond differential equation in addition to (4.3). This 
equation is given by (cf. Mukerjee and Dey, 1993; 
Mukerjee and Ghosh, 1997) 



(4.4) 



o. 



where, as before, g 3 {9) = -E ^ 10 ^™ \9]. If Jef- 
frey's prior satisfies (4.4), then it is the unique second- 
order matching prior. While for the location and 
scale family of distributions, this is indeed the case, 
this is not true in general. Of course, in such an in- 
stance, there does not exist any second-order match- 
ing prior. 

To see this, for irj(9) = I 1/2 (9), (4.4) reduces to 

d 2 



1 d 

3dff 



^- 1/2 (*)] 



0. 



which requires \r z / 2 {9)g^(9) + jg{I~ ll2 {9)) to be 
a constant free from 9. After some algebra, the above 
expression simplifies to (l/6)£[(^) 3 |0]// 3 / 2 (0). 
It is easy to check now that for the one-parameter 
location and scale family of distributions, the above 
expression does not depend on 9. However, for the 
one-parameter exponential family of distributions 
with canonical parameter 9, the same holds if and 
only if J'(0)/J 3 / 2 (0) does not depend on 9, or, in 
other words, 1(9) = exp(c0) for some constant c. An- 
other interesting example is given below. 

Example 9. (X 1 ,X 2 ) T ~ N 2 [(°), (J f)]. One can 
verify that I(p) = (1 + p 2 )/(l - p 2 ) 2 and £1,1,1 = 
— 2 (i^iy} so that Li i i j i// 3 / 2 (p) is not a constant. 
Hence, ttj is not a second-order matching prior, and 
there does not exist any second-order matching prior 
in this example. 

4.3 First-Order Quantile Matching Priors in the 
Presence of Nuisance Parameters 

The parameter of interest is still real-valued, but 
there may be one or more nuisance parameters. To 
fix ideas, suppose 9 = (9\, . . . , 9 P ), where 9\ is the pa- 
rameter of interest, while 02, ■ ■ ■ , 9 P are the nuisance 
parameters. As shown by Welch and Peers (1963) 
and later more rigorously by Datta and Ghosh 
(1995a) and Datta (1996), writing I~ l = {{I jk )), the 
probability matching equation is given by 

(4.5) E^-W^ 1 (/ 11 )- 1/2 } = 0. 
3=1 1 

Example 1 (Continued). First consider /x as the 
parameter of interest, and a the nuisance parame- 
ter. Since each element of the inverse of the Fisher 
information matrix is a constant multiple of a 2 , any 
prior 7r(//,cr) oc g(cr), g arbitrary, satisfies (4.5). Con- 
versely, when a is the parameter of interest, and ji is 
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the nuisance parameter, any prior 7r(/i, a) oc a~ l g(pi) 
satisfies (4.5). 

A special case considered in Tibshirani (1989) is of 
interest. Here 9\ is orthogonal to (9 2 , . . ■ , 9 p ) in the 
Fisherian sense, that is, P 1 = for j = 2, 3, . . . ,p. 

With orthogonality, (4.5) simplifies to 



_d_ 



M9)I U 1/2 } = 



(since I 11 = /f/). This leads to tt(0) = l\{ 2 h(9 2 , 
9p), where h is arbitrary. Often a second-order match- 
ing prior removes the arbitrariness of h. We will see 
an example later in this section. However, this need 
not always be the case, and, indeed, as seen earlier 
in the one parameter case, second-order matching 
priors may not always exist. We will address this 
issue later in this section. 

A special choice is h=l. The resultant prior w(9) = 
1*1 bears some intuitive appeal. Since under ortho- n 
gonality, \fn(9\ n — 9\)\9 ~ N(O,/f 1 1 (0)), one may ex- 

1 12 

pect I x [ (9) to be a first-order probability matching 
prior. This prior is only a member within the class 
of priors n(9) = ll^h(9 2 , ■ ■ ■ ,0 P ), as found by Tib- 
shirani (1989), and admittedly need not be second- 
order matching even when the latter exists. A recent 
article by Staicu and Reid (2008) has proved some 
interesting properties of the prior ir(9) -ijj 



ill 2 (*)■ 

This prior is also considered in Ghosh and Mukerjee 
(1992). 

For a symmetric location-scale family of distribu- 
tions, that is, when f(X) = f(—X), c 2 = 0, that is, 
[i and a are orthogonal. Now, when \x is the param- 
eter of interest and a is the nuisance parameter, the 
class of first-order matching priors 7Ti(//, ct) is char- 
acterized by h\(a), where hi is arbitrary. Similarly, 
when a is the parameter of interest and is the 
nuisance parameter, the class of first-order match- 
ing priors is characterized by 1^2(^,0) = a~ 1 h 2 (fi), 
where g 2 is arbitrary. The intersection of the two 
classes leads again to the unique prior n(fj,, a) = a . 

Example 2 (Continued). Let Xi, . . . , X n \(j,, a be 
i.i.d. N( / u,cr 2 ), and 9 = fi/a is again the parameter 
of interest. In order to find a parameter (f> which is 
orthogonal to 9, we rewrite the p.d.f. in the form 



(4.6) 



f(X\9,a) 

= {2<KO 



2\-l/2 



cxp 



Then the Fisher information matrix 
1 



1(9, a) 



9 ja 



a 



9/a 
2 (9 2 + 2) 



It turns out now if we reparameterize from (9, a) to 
(9, (j>), where <p = cr(9 2 + 2) 1 / 2 , then 9 and (j) are or- 
thogonal with the corresponding Fisher information 
matrix given by 1(9, f) = Diag[2(<9 2 + 2) -1 , 4>- 2 {9 2 + 
2)]. Hence, the class of first-order matching priors 
when 9 is the parameter of interest is given by 



7T 



+ 2) 1 / 2 h(<j)), where h is arbitrary. 



4.4 Second-Order Quantile Matching Priors in 
the Presence of Nuisance Parameters 

When 9\ is the parameter of interest, and (9 2 , . . . , 
9 p ) is the vector of nuisance parameters, the gen- 
eral class of second-order quantile matching priors is 
characterized in (2.4.11) and (2.4.12) of Datta and 
Mukerjee (2004, page 12). For simplicity, we con- 
sider only the case when 9\ is orthogonal to (9 2 , ■ ■ ■ , 
p j. In this case a first-order quantile matching prior 
1/2 

7r(#i,6>2, ...,9p)cx (9)h(9 2 , . .. ,9 P ) is also second- 
order matching if and only if h satisfies (cf. Datta 
and Mukerjee, 2004, page 27) the differential equa- 
tion 

p p 



EE 

s=2 u=2 



d 

do u 



I n 1/2 I SU E 



(4.7) + (h/Q) 



d 

50i 



-3/2 



E 



d 3 log/ 
dO 2 89 s 

dlogf 
d0 x 



0. 



We revisit Examples 1-5 and provide complete, 
or at least partial, characterization of second-order 
quantile matching priors. 

Example 1 (Continued). Let / be symmetric so 
that fi and a are orthogonal. First let \i be the pa- 
rameter of interest and a the nuisance parameter. 
Then since both the terms in (4.7) are zeroes, ev- 
ery first-order quantile matching prior of the form 
a~ l h(a) = q(a), say, is also second-order matching. 
This means that an arbitrary prior of the form ir(ji, a) 
is second-order matching as long as it is only a func- 
tion of a. On the other hand, if a is the parameter of 
interest and (i is the nuisance parameter, since the 
second term in (4.7) is zero, a first-order quantile 
matching prior of the form a~ l h(n) is also second- 
order matching if and only if h(fj) is a constant. 
Thus, the unique second-order quantile matching 
prior in this case is proportional to c -1 , which is 
Jeffreys' independence prior. 
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Example 2 (Continued). Recall that in this case 
writing 6 = fi/a, and cf> = a(9 2 + 2) 1 / 2 , the Fisher in- 



:Diag[2(fl 2 + 2) 

d6>(6> 2 +3) 



+ 



■ d 3 log/ | 



formation matrix 1(9, 

2)]. Also, B[(^)3| e ^] = _^3i and J5( 

0) = (4/</>)(# 2 + 2) " 2 . Hence, (4.7) holds if and only 
if h(4>) = 4>~ l . This leads to the unique second-order 



quantile matching prior 7r( 



oc 



+ 2)~ 1 /2. Back 



to the original (/x, a) parameterization, this leads 
to the prior n(fj,,a) oc <r , Jeffreys' independence 
prior. 

Example 3 (Continued). Consider once again 
the Neyman-Scott example. Since the Fisher infor- 
mation matrix I(6\, . . . , 6 n , a 2 ) = £;Diag(er~ 2 , . . . , 
a~ 2 ,na~ 2 ), a 2 is orthogonal to (9±, . . . ,9 n ). Now, 
the class of second-order matching priors is given 
by cr -2 /i(/ii, . . . , fi n ), where h is arbitrary. Simple 
algebra shows that in this case both the first and 
second terms in (4.7) are zeroes so that every first- 
order quantile matching prior is also second-order 
matching. 

Example 4 (Continued). From Tibshirani (1989), 
it follows that the class of first-order quantile match- 
ing priors for 9 is of the form (1 + 9 2 )~ l h((j)) , where h 
is arbitrary. Once again, since both the first and sec- 
ond terms in (4.7) are zeroes, every first-order quan- 
tile matching prior is also second-order matching. 

Example 5 (Continued). Again from Tibshirani 
(1989), the class of second-order matching priors 
when m, r and u are the parameters of interest 
are given respectively by h±(r,u), r~ 1 h,2(m,u) and 
n~ 1 /i3(m,r), where hi, /12 and /13 are arbitrary non- 
negative functions. Also, the prior irs(r, u) oc (ru) -3 / 2 
is second-order matching when m is the parame- 
ter of interest. On the other hand, any first-order 
matching prior is also second-order matching when 
either r or u is the parameter of interest. 

It may be of interest to find an example where 
a reference prior is not a second-order matching prior. 
Consider the gamma p.d.f. f(x\/j,,X) = (A A /r(A)) • 
exp[— Ay//x]y A ~ 1 / u~ A , where the mean \i is the pa- 
rameter of interest. The Fisher information matrix is 
given by Diag(A/J~ 2 , d ^rnr^ ~~ V^)- Then the two- 
group reference prior of Bernardo (1979) is given by 
^_i|-d_jo^r(A) _ w hii e the unique second- 

order quantile matching prior is given by Ait -1 • 

[^S^-(iA)]. 

In some of these examples, especially for the lo- 
cation and location-scale families, one gets exact 
rather than asymptotic matching. This is especially 



so when the matching prior is a right-invariant Haar 
prior. We will see some examples in the next section. 

5. OTHER PRIORS 
5.1 Invariant Priors 

Very often objective priors are derived via some 
invariance criterion. We illustrate with the location- 
scale family of distributions. 

Let X have p.d.f. p(x\n,a) = a~ l f((x — fi)/cr), 
—00 < fi < 00, < a < 00, where / is a p.d.f. Then, 
as found in Section 4, the Fisher information matrix 
I(li,a) is of the form I(fi,a) = &~ 2 ( c c l c c 2 3 )- Hence, 
Jeffreys' general rule prior tt j(fi, a) oc a" 2 . This prior, 
as we will see in this section, corresponds to a left- 
invariant Haar prior. In contrast, Jeffreys' indepen- 
dence prior tv j (/j,, a) oc a^ 1 corresponds to a right- 
invariant Haar prior. 

In order to demonstrate this, consider a group of 
linear transformations G = {g a ,b — 00 < a < 00, b > 
0}, where g a ,b{ x ) = a + bx. The induced group of 
transformations on the parameter space will be de- 
noted by G, where G = {g a ,b}, where g ajb (n,a) = 
(a + bfi,ba). The general theory of locally compact 
groups states that there exist two measures 771 and 
r]2 on G such that rji is left-invariant and 772 is right- 
invariant. What this means is that for all g G G 
and A a subset of G, r]i(gA) = rji(A) and r]2(Ag) = 
r? 2 (A) , where g A = {gg*:g*£ A} and Ag = {g*g:g*€ 
A}. The measures iji and 772 are referred to respec- 
tively as left- and right-invariant Haar measures. 
For the location-scale family of distributions, the 
left- and right-invariant Haar priors turn out to be 
7Ti(//, <r) oc a~ 2 and 7TR(ii,cj) oc <7 _1 , respectively (cf. 
Berger, 1985, pages 406-407; Ghosh, Delampady and 
Samanta, 2006, pages 136-138). 

The right-Haar prior usually enjoys more optimal- 
ity properties than the left-Haar prior. Some op- 
timality properties of left-Haar priors are given in 
Datta and Ghosh (1995b). In Example 1, for the 
location-scale family of distributions, the right-Haar 
prior is Bernardo's reference prior when either fx or 
a is the parameter of interest, while the other pa- 
rameter is the nuisance parameter. Also, it is shown 
in Datta, Ghosh and Mukerjee (2000) that for the 
location-scale family of distributions, the right-Haar 
prior yields exact matching of the coverage proba- 
bilities of Bayesian credible intervals and the cor- 
responding frequentist confidence intervals when ei- 
ther fj, or a is the parameter of interest, while the 
other parameter is the nuisance parameter. 
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For simplicity, we demonstrate this only for the 
normal example. Let X±, . . . ,X n \fi,a 2 be i.i.d. N(/z, 
a 2 ), where n>2. With the right-Haar prior 1^2(1^, 
a) oc cr , the marginal posterior distribution of fi is 
Student's t with location parameter X = Ya=i Xi/n, 
scale parameter S/ -y/n, where (n — 1)S 2 = Y12=i ~~ 
X) 2 , and deg rees of freedom n — 1. Hence, if fi±— a 
denotes the 100(1 — a)th percentile of this marginal 
posterior, then 

1 - a = P(fi < m- a \Xi, . . .,X n ) 

= P[yfc{} i -X)/S 

<V^{^l-a-X)/S\X 1 ,...,X n ] 
= P[t n - 1 <yfii(jJl 1 - a -X)/S\, 

so that y/n([ii- a — X)/S = t n -\^ a , the 100(1 — 
a)th percentile of t n -\. Now 

P{p < (i^al^a) 

= P[^(X - n)/S > -t n - ltl - a \fx,a] = 1 - a 

= P(ji < m- a \Xi, . . .,x n ). 

This provides the exact coverage matching proba- 
bility for fx. 

Next, with the same set up, when a 2 is the pa- 
rameter of interest, its marginal posterior is Inverse 
Gamma((n - l)/2, (n - l)S 2 /2). Now, if cr 2 „ a de- 
notes the 100(1 — a)th percentile of this marginal 
posterior, then o\_ a = (n — 1)5' /Xn-i-l-a) where 
Xn-i-i-a 1S 100(1 — a)th percentile of the Xn-i 
distribution. Now 

P(a 2 <al_ a \iM,a) 

= P[(n - l)S 2 /a 2 < xl-i;x- a \», <r] = l-a, 

showing once again the exact coverage matching. 

The general definition of a right-invariant Haar 
density on Q which we will denote by h r must satisfy 
lAg h r( x ) dx = f A h r (x)dx, where Ag = {g*g:g* G 
A}. Similarly, a left invariant Haar density on Q 
which we will denote by h\ must satisfy j- A hi{x) dx = 
j A hi(x) dx, where g A = {gg* :g* £ A}. An alternate 
representation of the right- and left-Haar densities 
are given by P hr (Ag) = P hr (A) and P h '(gA) = 
P h '(A), respectively. 

It is shown in Halmos (1950) and Nachbin (1965) 
that the right- and left-invariant Haar densities ex- 
ist and are unique up to a multiplicative constant. 
Berger (1985) provides calculation of h r and hi in 
a very general framework. He points out that if Q is 



isomorphic to the parameter space 0, then one can 
construct right- and left-invariant Haar priors on the 
parameter space 0. A very substantial account of 
invariant Haar densities is available in Datta and 
Ghosh (1995b). Severini, Mukerjee and Ghosh (2002) 
have demonstrated the exact matching property of 
right invariant Haar densities in a prediction context 
under fairly general conditions. 

5.2 Moment Matching Priors 

Here we discuss a new matching criterion which 
we will refer to as the "moment matching crite- 
rion." For a regular family of distributions, the clas- 
sic article of Bernstein and Von Mises (see, e.g., 
Ferguson, 1996, page 141; Ghosh, Delampady and 
Samanta, 2006, page 104) proved the asymptotic 
normality of the posterior of a parameter vector cen- 
tered around the maximum likelihood estimator or 
the posterior mode and variance equal to the in- 
verse of the observed Fisher information matrix eval- 
uated at the maximum likelihood estimator or the 
posterior mode. We utilize the same asymptotic ex- 
pansion to find priors which can provide high order 
matching of the moments of the posterior mean and 
the maximum likelihood estimator. For simplicity of 
exposition, we shall primarily confine ourselves to 
priors which achieve the matching of the first mo- 
ment, although it is easy to see how higher order 
moment matching is equally possible. 

The motivation for moment matching priors stems 
from several considerations. First, these priors lead 
to posterior means which share the asymptotic opti- 
mality of the MLE's up to a high order. In particu- 
lar, if one is interested in asymptotic bias or MSE re- 
duction of the MLE's through some adjustment, the 
same adjustment applies directly to the posterior 
means. In this way, it is possible to achieve Bayes- 
frequentist synthesis of point estimates. The second 
important aspect of these priors is that they pro- 
vide new viable alternatives to Jeffreys' prior even 
for real- valued parameters in the absence of nuisance 
parameters motivated from the proposed criterion. 
A third motivation, which will be made clear later in 
this section, is that with moment matching priors, it 
is possible to construct credible regions for param- 
eters of interest based only on the posterior mean 
and the posterior variance, which match the maxi- 
mum likelihood based confidence intervals to a high 
order of approximation. We will confine ourselves 
primarily to regular families of distributions. 
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Let Xi, X2, ■ ■ ■ , X n \9 be independent and identi- 
cally distributed with common density function 
f(x\9), where 9 G Q, some interval in the real line. 
Consider a general class of priors ir(9),9 S for 9. 
Throughout, it is assumed that both / and ir sat- 
isfy all the needed regularity conditions as given in 
Johnson (1970) and Bickel and Ghosh (1990). 

Let 9 n denote the maximum likelihood estimator 
of 9. Under the prior tt, we denote the posterior 
mean of 9 by 9^ ■ The formal asymptotic expansion 
given in Section 2 now leads to 9^ = 9 n + n~ 1 (-^% + 

7~ ~ ^Ht ) + 0„(n -3 / 2 ), where 03 and I n are defined 

In "K(9 n ) 

in Theorem 1. The law of large numbers and consis- 
tency of the MLE now give n(9% - 9 n ) 4 (^M + 

jfe^jg). With the choice tt(9) = exp[-i f 9 -f§ dt], 
one gets 9% - 9 n = O p (n~' i l 2 ). We will denote this 
prior as ttm(Q)- 

Ghosh and Liu (2011) have shown that if is a 
one-to-one function of 9, then the moment matching 
prior km{4>) for 4> is given by ttm(4>) = ^m{0)\%? 12 - 
We now see an application of this result. 

Example 6 (Continued). Consider the regular 
one-parameter exponential family of densities given 
by f{x\9) = exp[9x — ip(9) + h(x)]. For the canonical 
parameter 9, noting that 1(9) = ip"(6) and 53 (9) = 
r'(0) = /'((?), ttm(9) = exp[±jl'(9)/l(9)d9] = 
I 1 / 2 (9), which is Jeffreys' prior. On the other hand, 
for the population mean cp = ip'(9) which is a strictly 
increasing function of 9 [since ip"(9) = V(X\9) > 0], 
the moment matching prior ttjv/ (<^) = H^)- In par- 
ticular, for the binomial proportion p, one gets the 
Haldane prior tth(p) ocp -1 (l — p) , which is the 
same as Hartigan's (1964, 1998) maximum likeli- 
hood prior. However, for the canonical parameter 
9 = logit(p), whereas we get Jeffreys' prior, Hartigan 
(1964, 1998) gets the Laplace uniform(0, 1) prior. 

Remark 4. It is now clear that a fundamen- 
tal difference between priors obtained by matching 
probabilities and those obtained by matching mo- 
ments is the lack of invariance of the latter under 
one-to-one reparameterization. It may be interesting 
to find conditions under which a moment matching 
prior agrees with Jeffreys' prior I 1 / 2 (9) or the uni- 
form constant prior. The former holds if and only 
if gs(9) = I' (9), while the latter holds if and only if 
93(0) =0. 

The if part of the above results are immediate 
from the definition of ttm(9)- To prove the only if 



parts, note that if ttm(6) = I 1 / 2 (6), first taking log- 
arithms, and then differentiating with respect to 9, 
one gets ^ = §f[g so that g 3 (9) = I' (9). On the 
other hand, if tt(9) = c, then taking logarithms, and 
then differentiating with respect to 9, one gets 
93(0) = 0. 

The above approach can be extended to the match- 
ing of higher moments as well. Noting that V n (9\Xi, 
■ ■ ■ ,X n ) = E T {(9 — 9 n ) 2 \Xi, . . . ,X n )] — (9^ — 9 n ) 2 , it 
follows immediately that under the moment match- 
ing prior Tr M ,V K (9\Xi,...,X n ) = (nl n )~ l + O p (n~ 2 ). 
This fact helps construction of credible intervals for 9, 
the parameter of interest, centered at the posterior 
mean and scaled by the posterior standard deviation 
which enjoys the same asymptotic properties as the 
credible interval centered at the MLE and scaled 
by the square root of the reciprocal of the observed 
Fisher information number. 

6. SUMMARY AND CONCLUSION 

As mentioned in the Introduction, this article pro- 
vides a selective review of objective priors reflect- 
ing my own interest and familiarity with the top- 
ics. I am well aware that many important contribu- 
tions are left out. For instance, I have discussed only 
the two-group reference priors of Bernardo (1979). 
A more appealing later contribution by Berger and 
Bernardo (1992b) provided an algorithm for the con- 
struction of multi-group reference priors when these 
groups are arranged in accordance to their order 
of importance. In particular, the one-at-a-time ref- 
erence priors, as advocated by these authors, has 
proved to be quite useful in practice. Ghosal (1997, 
1999) provided the construction of reference priors 
in nonregular cases, while a formal definition of ref- 
erence priors encompassing both regular and non- 
regular cases has recently been proposed by Berger, 
Bernardo and Sun (2009). 

Regarding probability matching priors, we have 
discussed only the quantile matching criterion. There 
are several others, possibly equally important prob- 
ability matching criteria. Notable among these are 
the highest posterior density matching criterion as 
well as matching via inversion of test statistics, such 
as the likelihood ratio test statistic, Rao statistic 
or the Wald statistic. Extensive discussion of such 
matching priors is given in Datta and Mukerjee (2004) 
Datta et al. (2000) constructed matching priors via 
the prediction criterion, and related exact results in 
this context are available in Fraser and Reid (2002). 
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The issue of matching priors in the context of condi- 
tional inference has been discussed quite extensively 
in Reid (1996). 

A different class of priors called "the maximum 
likelihood prior" was developed by Hartigan (1964, 
1998). Roughly speaking, these priors are found by 
maximizing the expected distance between the prior 
and the posterior under a truncated Kullback-Leib- 
ler distance. Like the proposed moment matching 
priors, the maximum likelihood prior densities, when 
they exist, result in posterior means asymptotically 
negligible from the MLE's. I have alluded to some 
of these priors as a comparison with other priors as 
given in this paper. 

With the exception of the right- and left-invariant 
Haar priors, the derivation of the remaining priors 
are based essentially on the asymptotic expansion of 
the posterior density as well as the shrinkage argu- 
ment of J. K. Ghosh. This approach provides a nice 
unified tool for the development of objective priors. 
I believe very strongly that many new priors will be 
found in the future by either a direct application or 
slight modification of these tools. 

The results of this article show that Jeffreys' prior 
is a clear winner in the absence of nuisance pa- 
rameters for most situations. The only exception is 
the chi-square divergence where different priors may 
emerge. But that corresponds only to one special 
case, namely, the boundary of the class of divergence 
priors, while Jeffreys' prior continues its optimality 
in the interior. In the presence of nuisance param- 
eters, my own recommendation is to find two- or 
multi-group reference priors following the algorithm 
of Berger and Bernardo (1992a), and then narrow 
down this class of priors by finding their intersec- 
tion with the class of probability matching priors. 
This approach can even lead to a unique objective 
prior in some situations. Some simple illustrations 
are given in this article. I also want to point out the 
versatility of reference priors. For example, for non- 
regular models, Jeffreys' general rule prior does not 
work. But as shown in Ghosal (1997) and Berger, 
Bernardo and Sun (2009), one can extend the def- 
inition of reference priors to cover these situations 
as well. 

The examples given in this paper are purposely 
quite simplistic to aid understanding mainly of read- 
ers not familiar at all with the topic. Quite rightfully, 
they can be criticized as somewhat stylized. Both 
reference and probability matching priors, however, 
have been developed for more complex problems of 



practical importance. Among others, I may refer to 
Berger and Yang (1994), Berger, De Oliveira and 
Sanso (2001), Ghosh and Heo (2003), Ghosh, Car- 
lin and Srivastava (1994) and Ghosh, Yin and Kim 
(2003). The topics of these papers include time series 
models, spatial models and inverse problems, such 
as linear calibration and problems in bioassay, in 
particular, slope ratio and parallel line assays. One 
can easily extend this list. A very useful source for 
all these papers is Bernardo (2005). 
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